53 research outputs found

    Single Cell Training on Architecture Search for Image Denoising

    Full text link
    Neural Architecture Search (NAS) for automatically finding the optimal network architecture has shown some success with competitive performances in various computer vision tasks. However, NAS in general requires a tremendous amount of computations. Thus reducing computational cost has emerged as an important issue. Most of the attempts so far has been based on manual approaches, and often the architectures developed from such efforts dwell in the balance of the network optimality and the search cost. Additionally, recent NAS methods for image restoration generally do not consider dynamic operations that may transform dimensions of feature maps because of the dimensionality mismatch in tensor calculations. This can greatly limit NAS in its search for optimal network structure. To address these issues, we re-frame the optimal search problem by focusing at component block level. From previous work, it's been shown that an effective denoising block can be connected in series to further improve the network performance. By focusing at block level, the search space of reinforcement learning becomes significantly smaller and evaluation process can be conducted more rapidly. In addition, we integrate an innovative dimension matching modules for dealing with spatial and channel-wise mismatch that may occur in the optimal design search. This allows much flexibility in optimal network search within the cell block. With these modules, then we employ reinforcement learning in search of an optimal image denoising network at a module level. Computational efficiency of our proposed Denoising Prior Neural Architecture Search (DPNAS) was demonstrated by having it complete an optimal architecture search for an image restoration task by just one day with a single GPU

    Spatial-temporal Transformer-guided Diffusion based Data Augmentation for Efficient Skeleton-based Action Recognition

    Full text link
    Recently, skeleton-based human action has become a hot research topic because the compact representation of human skeletons brings new blood to this research domain. As a result, researchers began to notice the importance of using RGB or other sensors to analyze human action by extracting skeleton information. Leveraging the rapid development of deep learning (DL), a significant number of skeleton-based human action approaches have been presented with fine-designed DL structures recently. However, a well-trained DL model always demands high-quality and sufficient data, which is hard to obtain without costing high expenses and human labor. In this paper, we introduce a novel data augmentation method for skeleton-based action recognition tasks, which can effectively generate high-quality and diverse sequential actions. In order to obtain natural and realistic action sequences, we propose denoising diffusion probabilistic models (DDPMs) that can generate a series of synthetic action sequences, and their generation process is precisely guided by a spatial-temporal transformer (ST-Trans). Experimental results show that our method outperforms the state-of-the-art (SOTA) motion generation approaches on different naturality and diversity metrics. It proves that its high-quality synthetic data can also be effectively deployed to existing action recognition models with significant performance improvement

    A New Feature Normalization Scheme Based on Eigenspace for Noisy Speech Recognition

    Get PDF
    Abstract. We propose a new feature normalization scheme based on eigenspace, for achieving robust speech recognition. In particular, we employ the Mean and Variance Normalization (MVN) in eigenspace using unique and independent eigenspaces to cepstra, delta and delta-delta cepstra respectively. We also normalize training data in eigenspace and get the model from the normalized training data. In addition, a feature space rotation procedure is introduced to reduce the mismatch of training and test data distribution in noisy condition. As a result, we obtain a substantial recognition improvement over the basic eigenspace normalization

    Topological mappings of video and audio data

    Get PDF
    We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM).1 But whereas the GTM is an extension of a mixture of experts, this model is an extension of a product of experts.2 We show visualisation and clustering results on a data set composed of video data of lips uttering 5 Korean vowels. Finally we note that we may dispense with the probabilistic underpinnings of the product of experts and derive the same algorithm as a minimisation of mean squared error between the prototypes and the data. This leads us to suggest a new algorithm which incorporates local and global information in the clustering. Both ot the new algorithms achieve better results than the standard Self-Organizing Map
    • …
    corecore